

# Gowin FPGA-based DDR2&DDR3

# Hardware Design Reference Manual

TN662-1.2E, 12/28/2023

Copyright © 2023 Guangdong Gowin Semiconductor Corporation. All Rights Reserved.

GOWIN, Gowin, and GOWINSEMI are trademarks of Guangdong Gowin Semiconductor Corporation and are registered in China, the U.S. Patent and Trademark Office, and other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders. No part of this document may be reproduced or transmitted in any form or by any denotes, electronic, mechanical, photocopying, recording or otherwise, without the prior written consent of GOWINSEMI.

### Disclaimer

GOWINSEMI assumes no liability and provides no warranty (either expressed or implied) and is not responsible for any damage incurred to your hardware, software, data, or property resulting from usage of the materials or intellectual property except as outlined in the GOWINSEMI Terms and Conditions of Sale. All information in this document should be treated as preliminary. GOWINSEMI may make changes to this document at any time without prior notice. Anyone relying on this documentation should contact GOWINSEMI for the current documentation and errata.

## **Revision History**

| Date            | Version | Description                                |
|-----------------|---------|--------------------------------------------|
| 02/11/2019      | 1.0E    | Initial version published.                 |
| 01/29/2021      | 1.1E    | DDR3 related contents added.               |
| 12/28/2023 1.2E |         | GW5A/GW5AT/GW5AST/GW5AR/GW5AS DDR3 related |
| 12/20/2023      | 1.20    | contents added.                            |

# **Contents**

| Contents                                              | i   |
|-------------------------------------------------------|-----|
| List of Figures                                       | iii |
| List of Tables                                        | iv  |
| 1 About This Guide                                    | 1   |
| 1.1 Purpose                                           | 1   |
| 1.2 Related Documents                                 | 2   |
| 1.3 Terminology and Abbreviations                     | 3   |
| 1.4 Support and Feedback                              | 3   |
| 2 FPGA I/O Distribution                               | 4   |
| 2.1 FPGA I/O Distribution                             | 4   |
| 2.1.1 GW2A series of FPGA Products                    | 4   |
| 2.1.2 GW5A-25/GW5AR-25/GW5AS-25                       | 4   |
| 2.1.3 GW5AT-138/GW5AST-138/GW5A-138/GW5AS-138         | 5   |
| 2.2 DDR3 I/O Requirement                              | 5   |
| 2.3 I/O Distribution Rule                             | 6   |
| 2.4 FPGA I/O Mode                                     |     |
| 2.4.1 GW2A series of FPGA Products                    | 6   |
| 2.4.2 GW5A/GW5AT/GW5AST/GW5AS series of FPGA Products |     |
| 2.5 I/O Distribution                                  |     |
| 2.5.1 GW2A series of FPGA Products                    | 7   |
| 2.5.2 GW5A-25/GW5AR-25/GW5AS-25                       |     |
| 2.5.3 GW5AT-138/GW5AST-138/GW5A-138/GW5AS-138         | 10  |
| 3 Schematic Design                                    | 12  |
| 3.1 Power Module                                      | 12  |
| 3.1.1 VDD and VDDQ Power Module                       | 12  |
| 3.1.2 VREFCA, VREFDQ and 0.75V Pull-up Power Module   | 12  |
| 3.2 FPGA Module                                       | 13  |
| 3.2.1 Bank Voltage Distribution                       | 13  |
| 3.2.2 Bank I/O Distribution                           | 14  |
| 3.3 DDR3 Design                                       | 16  |

| 4 PCB Design 18                            |
|--------------------------------------------|
| 4.1 Point-to-Point Topological Structure18 |
| 4.1.1 Devices Placement18                  |
| 4.1.2 Point-to-Point Topology19            |
| 4.2 PCB Structure19                        |
| 4.2.1 8-Layer Stackup Structure19          |
| 4.2.2 6-Layer Stackup Structure20          |
| 4.3 Power Delivery Network21               |
| 4.3.1 PDN21                                |
| 4.3.2 Decoupling Capacitor22               |
| 4.3.3 Power Routing                        |
| 4.4 Signals Routing24                      |
| 4.4.1 Signal Grouping24                    |
| 4.4.2 The Same Layer Routing25             |
| 4.4.3 Trace width                          |
| 4.4.4 Spacing                              |
| 4.4.5 Length                               |
| 4.4.6 Equal Length                         |
| 4.4.7 Impedance Matching27                 |
| 4.5 The Reference Plane                    |
| 4.5.1 Continuous Reference Plane           |
| 4.5.2 Reference Plane Suture               |
| 4.5.3 Signal Return Path32                 |
| 4.6 Simulation                             |
| 4.6.1 Simulation                           |
| 4.6.2 Timing Budget Design34               |
| 4.7 Conclusion                             |
| 5 Notes36                                  |

# **List of Figures**

| Figure 1-1 Connection Diagram of Gowin FPGA and DDR3                                         | 2    |
|----------------------------------------------------------------------------------------------|------|
| Figure 3-1 VDD/VDDQ Power Module Schematic                                                   | 12   |
| Figure 3-2 VREFCA/VREFDQ Power Module Schematic                                              | 13   |
| Figure 3-3 GW2A18 FPGA Bank Voltage Distribution Schematic                                   | 14   |
| Figure 3-4 Bank4 I/O Distribution Schematic                                                  | 15   |
| Figure 3-5 Bank5 I/O Distribution Schematic                                                  | 15   |
| Figure 3-6 Bank6 I/O Distribution Schematic                                                  | 16   |
| Figure 3-7 DDR3 Schematic                                                                    | 17   |
| Figure 4-1 Schematic Diagram of Gowin FPGA and DDR3 Placement                                | 18   |
| Figure 4-2 Point-to-Point Connection Topology of Gowin FPGA and DDR3                         | 19   |
| Figure 4-3 8-Layer Stackup Structure                                                         | 20   |
| Figure 4-4 6-Layer Stackup Structure                                                         | 21   |
| Figure 4-5 VTT Terminal Resistance Interval Decoupling Capacitor Placement Reference Diagram | m 23 |
| Figure 4-6 Copper Design of the Power and Ground                                             | 24   |
| Figure 4-7 Intra-group and Inter-group Spacing Signal                                        | 26   |
| Figure 4-8 Data Bus Termination Scheme                                                       | 28   |
| Figure 4-9 Trace Spanning Split                                                              | 29   |
| Figure 4-10 Margin between Trace and Hollowing out Area                                      | 30   |
| Figure 4-11 Routing in the Pin Area                                                          | 30   |
| Figure 4-12 Ground Plane Suture                                                              | 31   |
| Figure 4-13 Add Ground Via to Crowded Area (Green for Ground, Red for Power Supply)          | 31   |
| Figure 4-14 Command, Address, Control Line Simulation and Contrast                           | 32   |
| Figure 4-15 Jumper Capacitance                                                               | 33   |

TN662-1.2E iii

# **List of Tables**

| Table 1-1 Terminology and Abbreviations | 3  |
|-----------------------------------------|----|
| Table 2-1 DDR3 Signal                   | 6  |
| Table 2-2 DDR3 I/O Assignment           | 7  |
| Table 2-3 DDR3 I/O Assignment           | 9  |
| Table 2-4 DDR3 I/O Assignment           | 9  |
| Table 2-5 DDR3 I/O Assignment           | 10 |
| Table 2-6 DDR3 I/O Assignment           | 10 |
| Table 4-1 DDR3 Signal Grouping          | 24 |
| Table 4-2 Address Timing Budget Example | 34 |

TN662-1.2E iv

1 About This Guide 1.1 Purpose

# 1 About This Guide

# 1.1 Purpose

Based on DDR3 devices, this manual mainly introduces hardware design method of high-speed storage circuit, including the I/O distribution, the schematic design, power network design, PCB routing, reference graphic design, and simulation, etc. It aims to help you quickly finish high-speed storage scheme of hardware design with the features of good signal integrity, low power consumption, and low noise.

For the device design method of DDR2, you may refer to that of DDR3. The difference between the two in architecture is very little. The main difference is that DDR3 device bus speed is faster. And DDR3 device power supply voltage is 1.5V, while DDR2 device power supply voltage is 1.8V.

Taking an example of a proven and stable design, this manual systematically introduces the connection between Gowin FPGA and DDR3. The FPGA chip is GW2A-LV18PG256, and the memory chip is a single die package DDR3 SDRAM device of MT41J128M16JT-125: K model manufactured by Micron. The two are point-to-point connection. The connection diagram is as follows.

TN662-1.2E 1(36)

1 About This Guide 1.2 Related Documents

DDR3 A[13..0] DDR3 BA[2..0] DDR3\_DQ[15..0] DDR3\_UDM DDR3 UDQSn DDR3\_UDQSp GOWIN高云 Arora DDR3\_LDM DDR3 LDQSn DDR3 SDRAM DDR3\_LDQSp 2Gbit DDR3\_CASn DDR3 RASn DDR3\_WEn DDR3 ODT DDR3 CSn DDR3\_RSTn DDR3\_CK\_EN DDR3 CKn DDR3\_CKp

Figure 1-1 Connection Diagram of Gowin FPGA and DDR3

### 1.2 Related Documents

The latest user guides are available on the GOWINSEMI Website. You can find the related documents at <a href="https://www.gowinsemi.com">www.gowinsemi.com</a>:

- DS102, GW2A series of FPGA Products Data Sheet
- DS981, GW5AT series of FPGA Products Data Sheet
- DS1103, GW5A series of FPGA Products Data Sheet
- DS1104, GW5AST series of FPGA Products Data Sheet
- DS1114, GW5AS-138 Data Sheet
- DS1103E, GW5A series of FPGA Products Data Sheet
- DS1108E, GW5AR series of FPGA Products Data Sheet
- DS1115E, GW5AS-25 Data Sheet

TN662-1.2E 2(36)

# 1.3 Terminology and Abbreviations

The abbreviations and terminology used in this manual are as shown in Table 1-1 below.

Table 1-1 Terminology and Abbreviations

| Terminology and Abbreviations | Meaning                                                        |
|-------------------------------|----------------------------------------------------------------|
| FPGA                          | Field Programmable Gate Array                                  |
| PG256                         | PBGA256 package                                                |
| OSE8                          | A serializer of 8 bits parallel input and 1 bit serial output. |
| OSE8_MEM                      | 8 to 1 serializer with memory                                  |
| DDR                           | Double-Data-Rate Synchronous Dynamic Random                    |
| BBIX                          | Access Memory                                                  |
| SSO                           | Simultaneous switching outputs                                 |
| SDP                           | Single die package                                             |
| ODT                           | On-Die Termination                                             |
| PDN                           | Power Delivery Network                                         |
| SI                            | Signal Integrity                                               |

# 1.4 Support and Feedback

Gowin Semiconductor provides customers with comprehensive technical support. If you have any questions, comments, or suggestions, please feel free to contact us directly by the following ways.

Website: www.gowinsemi.com/en

E-mail: support@gowinsemi.com

TN662-1.2E 3(36)

# 2 FPGA I/O Distribution

# 2.1 FPGA I/O Distribution

Before allocating I/O for the DDR3 device, it is essential to know the bank distribution available in the FPGA device that can be used for DDR3 DQ assignment.

### 2.1.1 GW2A series of FPGA Products

Each of the 8 I/O banks in GW2A series of FPGA products has DQ resources. Each bank consists of 2 clusters of DQ resources. Figure 2-1 Available Banks for DDR3 in the GW2A Series of FPGA Products



## 2.1.2 GW5A-25/GW5AR-25/GW5AS-25

Four I/O banks in GW5A-25/GW5AR-25/GW5AS-25 have DQ resources. Each bank consists of 2 clusters of DQ resources.

TN662-1.2E 4(36)

Gowin FPGA

IO Bank 2

IO Bank 2

IO Bank 2

IO Bank 3

Figure 2-2 Available Banks for DDR3 in GW5A-25/GW5AR-25/GW5AS-25

### 2.1.3 GW5AT-138/GW5AST-138/GW5A-138/GW5AS-138

Six I/O banks in GW5AT-138/GW5AST-138/GW5A-138/GW5AS-138 have DQ resources. Each bank consists of 4 clusters of DQ resources. Figure 2-3 Available Banks for DDR3 in GW5AT-138/GW5AST-138/GW5A-138 /GW5AS-138



# 2.2 DDR3 I/O Requirement

For MT41J128M16 device and 96pin FBGA package, a total of 47 I/O are required. If CS pin is fixed and lowered, at least 46 I/O are required.

The I/O signals required for connection are as follows.

TN662-1.2E 5(36)

Table 2-1 DDR3 Signal

| Name       | I/O Required |
|------------|--------------|
| DQ[7:0]    | 8            |
| LDQS,LDQS# | 2            |
| LDM        | 1            |
| DQ[15:8]   | 8            |
| UDQS,UDQS# | 2            |
| UDM        | 1            |
| CK, CK#    | 2            |
| CKE        | 1            |
| A[13:0]    | 14           |
| BA[2:0]    | 3            |
| CS#        | 1            |
| RAS#       | 1            |
| CAS#       | 1            |
| WE#        | 1            |
| ODT        | 1            |
| Total      | 47           |

# 2.3 I/O Distribution Rule

To ensure the signals of DQ[7:0], LDQSn, LDQSp and LDM data group of DDR3 are distributed in the same DQ cluster of FPGA, DQ5 is allocated in this design.

To ensure the signals of DQ[15:8], UDQSn, UDQSp and UDM data group of DDR3 are distributed in the same DQ cluster of FPGA, DQ6 is allocated in this design.

Clock group, command group and control group signals are not required to be in the same DQ cluster.

The differential clock CK and CK# signals need to be allocated to the global differential clock of FPGA.

# 2.4 FPGA I/O Mode

### 2.4.1 GW2A series of FPGA Products

The CLK Ratio of DDR3 controller supports both 1:2 mode and 1:4 mode.

In 1:4 mode, if the internal clock frequency of the FPGA is 100MHz

TN662-1.2E 6(36)

and the data frequency at the I/O ports is 400MHz, the data rate for each individual data line is 800Mbps after sampling on both rising and falling edges. Common GPIO of FPGA also supports this mode, which can effectively utilize GPIO resources of FPGA.

In the mode of 1:4, if the internal main frequency of FPGA is 100MHz and the data frequency of I/O port is 400MHz, after up and down sampling, the data rate of a single data line is 800Mbps. All single-ended signals and differential signals connected to DDR3 occupy a pair of differential signals on the FPGA. If one differential pair only packages one pin, the pin can be used as a 1:4 mode. Differential pairs are not required to be true LVDS. For example, if DQ13 signal line is connected to IOB32A pin, then IOB32B pin can no longer be connected to other signals of DDR3. Higher data rate can be achieved with this mode.

In order to support high-speed transmission, this design requires the FPGA to work in 1:4 mode.

### 2.4.2 GW5A/GW5AT/GW5AST/GW5AS series of FPGA Products

The CLK Ratio of DDR3 controller supports 1:4 mode.

In 1:4 mode, if the internal clock frequency of the FPGA is 100MHz and the data frequency at the I/O ports is 400MHz, the data rate for each individual data line is 800Mbps after sampling on both rising and falling edges. This mode is supported by each GPIO on the FPGA.

# 2.5 I/O Distribution

Taking DDR3 controller with the CLK Ratio of 1:4 mode as an example, this section mainly summarizes the basic principles of allocating DDR3 I/Os of different series of devices. Consideration also needs to be given to the actual number of I/Os bonded out by different packages.

### 2.5.1 GW2A series of FPGA Products

The recommended I/O assignment for a DDR3 2Gbit (16 Meg x 16 x 8 Banks) chip are illustrated in Table 2-2.

Table 2-2 DDR3 I/O Assignment

| No. | DDR3 I/O                                              | FPGA Bank   | Note                                                                                                                                |
|-----|-------------------------------------------------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------|
| 1   | DQ[7:0],LDQS,LDQS#,LDM                                | Bank6       |                                                                                                                                     |
|     | DQ[15:8],UDQS,UDQS#,UDM                               | Bank7       |                                                                                                                                     |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank6,Bank7 | If packaging limits the assignment of all control signals within two banks, it is acceptable to assign them to neighboring banks as |

TN662-1.2E 7(36)

| No. | DDR3 I/O                                              | FPGA Bank   | Note                                                                                                                                                                                                                   |
|-----|-------------------------------------------------------|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|     |                                                       |             | long as the transfer bandwidth performance meets the requirements of the use case.                                                                                                                                     |
| 2   | DQ[7:0],LDQS,LDQS#,LDM                                | Bank0       |                                                                                                                                                                                                                        |
|     | DQ[15:8],UDQS,UDQS#,UDM                               | Bank1       |                                                                                                                                                                                                                        |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank0,Bank1 | If packaging limits the assignment of all control signals within two banks, it is acceptable to assign them to neighboring banks as long as the transfer bandwidth performance meets the requirements of the use case. |
| 3   | DQ[7:0],LDQS,LDQS#,LDM                                | Bank2       |                                                                                                                                                                                                                        |
|     | DQ[15:8],UDQS,UDQS#,UDM                               | Bank3       |                                                                                                                                                                                                                        |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank2,Bank3 | If packaging limits the assignment of all control signals within two banks, it is acceptable to assign them to neighboring banks as long as the transfer bandwidth performance meets the requirements of the use case. |
| 4   | DQ[7:0],LDQS,LDQS#,LDM                                | Bank4       |                                                                                                                                                                                                                        |
|     | DQ[15:8],UDQS,UDQS#,UDM                               | Bank5       |                                                                                                                                                                                                                        |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank4,Bank5 | If packaging limits the assignment of all control signals within two banks, it is acceptable to assign them to neighboring banks as long as the transfer bandwidth performance meets the requirements of the use case. |

TN662-1.2E 8(36)

# 2.5.2 GW5A-25/GW5AR-25/GW5AS-25

The recommended I/O assignment for a DDR3 2Gbit (16 Meg x 16 x 8 Banks) chip are illustrated in Table 2-3.

Table 2-3 DDR3 I/O Assignment

| No. | DDR3 I/O                                                        | FPGA Bank | Notes |
|-----|-----------------------------------------------------------------|-----------|-------|
| 1   | DQ[7:0],LDQS,LDQS#,LDM<br>DQ[15:8],UDQS,UDQS#,UDM               | Bank1     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT           | Bank0     |       |
| 2   | DQ[7:0], LDQS,LDQS#, LDM<br>DQ[15:8], UDQS,UDQS#,UDM            | Bank2     |       |
|     | CK, CK#, CKE,<br>A[13:0], BA[2:0], CS#, RAS#, CAS#, WE#,<br>ODT | Bank3     |       |
| 3   | DQ[7:0],LDQS,LDQS#,LDM DQ[15:8],UDQS,UDQS#,UDM                  | Bank6     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT           | Bank7     |       |

The recommended I/O assignment for two DDR3 2Gbit (16 Meg x 16 x 8 Banks) chips are illustrated in Table 2-4.

Table 2-4 DDR3 I/O Assignment

| No. | DDR3 I/O                                              | FPGA Bank       | Notes |
|-----|-------------------------------------------------------|-----------------|-------|
| 1   | DQ[7:0],LDQS,LDQS#,LDM DQ[15:8],UDQS,UDQS#,UDM        | Bank2           |       |
|     | DQ[23:16],LDQS,LDQS#,LDM<br>DQ[31:24],UDQS,UDQS#,UDM  | Bank3           |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank1/<br>Bank4 |       |

TN662-1.2E 9(36)

## 2.5.3 GW5AT-138/GW5AST-138/GW5A-138/GW5AS-138

The recommended I/O assignment for two DDR3 2Gbit (16 Meg x 16 x 8 Banks) chips are illustrated in Table 2-5.

Table 2-5 DDR3 I/O Assignment

| No. | DDR3 I/O                                              | FPGA Bank | Notes |
|-----|-------------------------------------------------------|-----------|-------|
| 1   | DQ[7:0],LDQS,LDQS#,LDM DQ[15:8],UDQS,UDQS#,UDM        | Bank2     |       |
|     | DQ[23:16],LDQS,LDQS#,LDM<br>DQ[31:24],UDQS,UDQS#,UDM  | Bank2     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank3     |       |
| 2   | DQ[7:0],LDQS,LDQS#,LDM DQ[15:8],UDQS,UDQS#,UDM        | Bank4     |       |
|     | DQ[23:16],LDQS,LDQS#,LDM<br>DQ[31:24],UDQS,UDQS#,UDM  | Bank4     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank5     |       |
| 3   | DQ[7:0],LDQS,LDQS#,LDM DQ[15:8],UDQS,UDQS#,UDM        | Bank6     |       |
|     | DQ[23:16],LDQS,LDQS#,LDM<br>DQ[31:24],UDQS,UDQS#,UDM  | Bank6     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank7     |       |
| 4   | DQ[7:0],LDQS,LDQS#,LDM DQ[15:8],UDQS,UDQS#,UDM        | Bank7     |       |
|     | DQ[23:16],LDQS,LDQS#,LDM<br>DQ[31:24],UDQS,UDQS#,UDM  | Bank7     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank6     |       |

The recommended I/O assignment for four DDR3 2Gbit (16 Meg x 16 x 8 Banks) chips are illustrated in Table 2-6.

Table 2-6 DDR3 I/O Assignment

| No. | DDR3 I/O                | FPGA Bank | Notes |
|-----|-------------------------|-----------|-------|
| 1   | DQ[7:0],LDQS,LDQS#,LDM  | Bank4     |       |
|     | DQ[15:8],UDQS,UDQS#,UDM |           |       |

TN662-1.2E 10(36)

| No. | DDR3 I/O                                              | FPGA Bank | Notes |
|-----|-------------------------------------------------------|-----------|-------|
|     | DQ[23:16],LDQS,LDQS#,LDM<br>DQ[31:24],UDQS,UDQS#,UDM  | Bank4     |       |
|     | DQ[39:32],LDQS,LDQS#,LDM<br>DQ[47:40],UDQS,UDQS#,UDM  | Bank5     |       |
|     | DQ[55:48],LDQS,LDQS#,LDM<br>DQ[63:56],UDQS,UDQS#,UDM  | Bank5     |       |
|     | CK, CK#,CKE,<br>A[13:0],BA[2:0],CS#,RAS#,CAS#,WE#,ODT | Bank6     |       |

TN662-1.2E 11(36)

3 Schematic Design 3.1 Power Module

# 3 Schematic Design

### 3.1 Power Module

This section only introduces the power supply required by DDR3 devices. For the power module of FPGA, please refer to the schematic document of the corresponding development board.

## 3.1.1 VDD and VDDQ Power Module

NCP3170ADR2G switching power chip is adopted, and its maximum current is 3A.

Figure 3-1 VDD/VDDQ Power Module Schematic



# 3.1.2 VREFCA, VREFDQ and 0.75V Pull-up Power Module

DDR device dedicated power chip is adopted, with push-pull function of the terminal regulator, source current 4A, sink current 5A.

TN662-1.2E 12(36)

3 Schematic Design 3.2 FPGA Module



Figure 3-2 VREFCA/VREFDQ Power Module Schematic

### 3.2 FPGA Module

The FPGA module design in this section only includes the voltage distribution of FPGA Bank and the distribution of I/O connected with DDR3.

## 3.2.1 Bank Voltage Distribution

The core voltage of Gowin FPGA is 1.0v, and the voltage of Bank4, Bank5 and Bank6 connected to DDR3 chip is 1.5v, and the voltage of other Banks is distributed as required.

TN662-1.2E 13(36)

3 Schematic Design 3.2 FPGA Module



Figure 3-3 GW2A18 FPGA Bank Voltage Distribution Schematic

# 3.2.2 Bank I/O Distribution

In 1:4 mode, all single-ended signals and differential signals connected to DDR3 both occupy a pair of differential signals on the FPGA. The differential clock of DDR3 is allocated to the global clock of FPGA. Unused differential signals can only be used as simple input ports, such as keys, switches, etc.

TN662-1.2E 14(36)

3 Schematic Design 3.2 FPGA Module

Figure 3-4 Bank4 I/O Distribution Schematic



Figure 3-5 Bank5 I/O Distribution Schematic



TN662-1.2E 15(36)

3 Schematic Design 3.3 DDR3 Design

IO Bank 6 DDR3\_A9 D7 SW2 DDR3\_RSTn X E11 IOL31A/LVDS/DQ2 IOL53B/LVDS/DQ3 E6 DDR3\_A5 IOL31B/LVDS/DQ2 IOL53A/LVDS/DQ3 C7 SW3 IOL33A/LVDS/DQ2 IOL40B/LVDS/DQ2 В7 DDR3 A11 A10 IOL33B/LVDS/DQ2 IOL40A/LVDS/DQ2 DDR3 SW4 F8 F9 IOL35A/LVDS/DQ2 IOL38B/LVDS/DQ2 DDR3\_A7 D8 IOL35B/LVDS/DQ2 IOL38A/LVDS/DQ2 Configure Pins DDR3\_A13 C8 IOL29A/GCLKT\_6/LVDS/DQ2 A8 F7 IOL29B/GCLKC\_6/LVDS/DQ2 IOL45A/LPLL2\_T\_IN/DQ2 IOL45B/LPLL2\_C\_IN/DQ2 **B5** IOL47B/LPLL2\_C\_FB/LVDS/DQ3 IOL47A/LPLL2\_T\_FB/LVDS/DQ3 C4 DDR3\_A4 SW1

Figure 3-6 Bank6 I/O Distribution Schematic

GW2A-18K-PBGA256

# 3.3 DDR3 Design

DQ/DQS signals of DDR3 devices support ODT, and no external terminal resistance is required for the design of DDR3. There is no requirement in this official manual to design 49.9  $\Omega$  termination resistance for ADDR/CMD/CNTRL signal lines, but it has been shown to have many advantages. The impedance matching resistance of differential clock is 100  $\Omega$ .

TN662-1.2E 16(36)

3 Schematic Design 3.3 DDR3 Design



TN662-1.2E 17(36)

# 4 PCB Design

# 4.1 Point-to-Point Topological Structure

### 4.1.1 Devices Placement

The placement of devices directly affects the difficulty of routing. Reasonable placement not only makes routing easier, but also makes routing shorter and less via, ultimately improving the communication quality of signals The optimal placement and routing of a single device is shown in Figure 4-1.

Figure 4-1 Schematic Diagram of Gowin FPGA and DDR3 Placement



TN662-1.2E 18(36)

4 PCB Design 4.2 PCB Structure

### 4.1.2 Point-to-Point Topology

The topological structure of Gowin FPGA connected to a single DDR3 chip is shown in Figure 4-2.

Figure 4-2 Point-to-Point Connection Topology of Gowin FPGA and DDR3



### 4.2 PCB Structure

### 4.2.1 8-Layer Stackup Structure

Well-designed PCB structure is the key to eliminate switching noise. The ground layer must provide a low impedance return path for the digital circuit Please take account of the ground layer for the routing of all signals as far as possible. The design adopts the 8-layer stackup structure, as shown in the Figure 4-3. The 1st, 3rd, 6th and 8th layers are for signal, the 2nd and 7th are for ground, the 5th is the power layer, and the 4th is the sharing layer for the power and the ground.

The high-speed data line connected with DDR3 is in the third layer, and the ground plane of the second layer is selected as the reference plane, so that the depth of the via is shallower and crosstalk can be reduced. The address line and control line are on 6th and 8th layers. If the high-speed signal is on a lower layer, it will go through deeper via to lead to more coupling jitter.

TN662-1.2E 19(36)

4 PCB Design 4.2 PCB Structure

Physical >> Types >> Thickness >> Objects Value Layer Layer Function Layer ID Material # Name mil Surface TOP 1 Conductor Conductor 1.2 Copper 8 Fr-4 Dielectric Dielectric 2 2 G2 Plane Plane 1.2 Copper Dielectric Dielectric Fr-4 3 S3 Conductor Conductor 1.2 3 Copper Dielectric Dielectric 8 Fr-4 PG4 4 Plane Plane 1.2 4 Copper Dielectric Fr-4 Dielectric 8 5 P5 5 Plane Plane 1.2 Copper Dielectric Dielectric 6 **S6** Conductor Conductor 1.2 6 Copper Dielectric Dielectric 8 Fr-4 7 G7 1.2 Copper Plane Plane Dielectric Dielectric 8 Fr-4 BOTTOM 1.2 8 Copper 8 Conductor Conductor Surface

Figure 4-3 8-Layer Stackup Structure

### 4.2.2 6-Layer Stackup Structure

It is recommended that a minimum of six layers of PCB stack design. The number of signal layers required is determined by the number of storage chips, the number of signals, and the signals space. It is recommended to obtain feedback on signal integrity through simulation. Below is a 6-layer PCB diagram with four inner layers Among them:

- The 1st, 3rd and 4th are for the signal
- The 6th is for signal and power (VDD1)
- The 2nd is for the ground
- The 5th is for power (VDD1)

TN662-1.2E 20(36)

1.4 mil L1 Signal 1 4 mil L2-V<sub>SS</sub> 1oz 4 mil L3 Signal 2 1oz Signal 3 1oz L4 4 mil L5-V<sub>DD</sub> 1oz 4 mil L6 Signal 4 1.4 mil

Figure 4-4 6-Layer Stackup Structure

# 4.3 Power Delivery Network

#### 4.3.1 PDN

With the increase of data transmission frequency, the timing and noise margin will decrease. The data bus width of DDR3 is 16 bits. In actual running, there are multiple I/O simultaneous switching outputs (SSO). When SSO occurs, a large amount of current is to source or sink into the power delivery network, especially on VDDQ and VSSQ lines. If the PDN is poorly designed, SSO will generate large noise on the power supply, resulting in a series of timing problems.

Hardware design engineers should design a strong PDN for the system board so that all components on the board have a stable power supply. A good and reliable PDN design method is as follows:

- 1. Make sure the path impedance from the power module to the FPGA and memory devices is as low as possible. The lower the path impedance, the smaller the ripple.
- 2. Use regional copper if space permits. If space is limited, make sure VDD/VSS and VDDQ/VSSQ are routed as wide as possible.
- 3. In order to improve the anti-noise ability, VDD/VSS and VDDQ/VSSQ are isolated from each other inside the chip, so the via sharing of VDD and VDDQ, VSS and VSSQ should be avoided as far as possible.
- 4. Sufficient decoupling capacitors are placed around the FPGA and memory devices to absorb high frequency current burrs.

TN662-1.2E 21(36)

### 4.3.2 Decoupling Capacitor

Sufficient decoupling capacitors are placed at appropriate locations on the PCB to prevent transmission errors caused by excessive power supply noise. When the circuit is in operation, the power supply will generate a lot of noise due to high-frequency data transmission. The capacitor placed to restrain power supply fluctuation and provide a current return path for the signal.

The power supply of DDR3 is divided into core power supply (VDD/VSS) and DQ data supply (VDDQ/VSSQ). Since the frequency of core is often lower than that of DQ data bus, the capacitance of core decoupling is larger. It is recommended to select capacitance between 100nF and 1uF. The DDR3 usually has an on-chip integrated capacitor and is not completely dependent on external decoupling, so it is not necessary to assign a capacitor to each power pin.

Usually DDR3 chip and power chip are on the same PCB board, and there are a lot of capacitors with different capacitance values around the power chip, so 100nF decoupling capacitors for VDD/VSS and VDDQ/VSSQ can be considered. Place one at the four corners of the chip, as close as possible to the chip. The connection via is between the capacitor and the chip, and the line width is matched with the via parameters.

For VTT terminal resistance of address, command and control signals, it is recommended that every four terminal resistances are accompanied by a 100nF capacitor whose position is aligned with the resistance, as shown in Figure 4-5.

TN662-1.2E 22(36)

Figure 4-5 VTT Terminal Resistance Interval Decoupling Capacitor Placement Reference Diagram



### 4.3.3 Power Routing

Power supply voltage VDD, VDDQ, VSS and VSSQ are coppered as much as possible. Connection via routing is as short as possible, less than 8mil is recommended. Any connection from the supply voltage to the via should be as wide as possible, with a recommended line width of 20mil to reduce line impedance. The copper design of the power and the ground is shown in Figure 4-6.

TN662-1.2E 23(36)



Figure 4-6 Copper Design of the Power and Ground

# 4.4 Signals Routing

# 4.4.1 Signal Grouping

In order to facilitate the routing design, it is necessary to define the signal grouping of DDR3 devices, so as to divide the intra-group signal and inter-group signal. DDR3 is divided into five groups: low byte data group, high byte data group, differential clock group, address group and control group. The signals of each group are shown in the table below.

Table 4-1 DDR3 Signal Grouping

| Name       | Signal Group         |  |
|------------|----------------------|--|
| DQ[7:0]    |                      |  |
| LDQS,LDQS# | Low byte data group  |  |
| LDM        |                      |  |
| DQ[15:8]   |                      |  |
| UDQS,UDQS# | High byte data group |  |
| UDM        |                      |  |
| CK, CK#    | Differential clock   |  |
| A[13:0]    | - Address group      |  |
| BA[2:0]    |                      |  |
| CKE        |                      |  |
| CS#        | Control group        |  |
| RAS#       |                      |  |
| CAS#       |                      |  |
| WE#        |                      |  |
| ODT        |                      |  |

TN662-1.2E 24(36)

### 4.4.2 The Same Layer Routing

According to the signal grouping in the above section, the signals in the same group should be routed in the same layer, so as to keep the impedance of the intra-group signals continuous and consistent. Especially for data group signals, the requirement for the continuity of impedance is higher. The data group signals must be changed through the same via and the same layers, and the number of via cannot exceed two. Hierarchical routing of data groups, addresses, controls, and clocks are needed where possible.

On the other hand, the propagation delay is different because of the dielectric constant of inner and outer layer. The dielectric constant of the inner layer is determined by the glass and resin of the PCB. The outer layer is composed of PCB material, surface soldering layer, air and other materials of different properties. Generally, external routing reduces propagation delay by 10% compared with internal routing, and external routing signal transmission is faster.

### 4.4.3 Trace width

Considering the space condition and impedance requirement of the whole PCB board, it needs to select the suitable trace width and trace spacing. The single-ended signal in this design chooses trace width of 6 mil, target impedance values of 50  $\Omega$ ; Differential signal trace width is 4.5 mil, spacing of 9 mil, target impedance value of 100  $\Omega$ .

## 4.4.4 Spacing

Inter-group spacing and inter-group spacing both have effects on signal integrity. As shown in Figure 4-7, Group1 and Group2 are signals of different groups, S1 is intra-group spacing, S3 is trace width, and S2 is inter-group spacing.

TN662-1.2E 25(36)



Figure 4-7 Intra-group and Inter-group Spacing Signal

The recommended spacing depends on the thickness of the medium between the routing layer and the reference layer. Generally, it is recommended that the spacing is 3 times the thickness of the medium. The recommended intra-group spacing (S1) is 12mil on average and 8mil minimum. The recommended inter-group spacing (S2) is 20mil and 8mil minimum. Crosstalk can affect signal integrity (SI) if all signals are routed at 8mil throughout the whole course. If the short spacing exceeds the limit, there are little effect on the signal integrity.

Crosstalk is a function of spacing, dielectric thickness and signals transition frequency. For the signals transition frequency less than 1v/ns, the spacing can be closer. A low-speed system usually has more timing margin and can accommodate more crosstalk without affecting SI.

## 4.4.5 Length

When other conditions are met, the shorter the routing between the FPGA and the DDR3, the better. If the length is less than 1000mil (2.5cm), the routing is simpler and the signal quality is usually increased proportionally.

For single chip design, it is easy to meet the length of about 1000mil. The length of this design is about 1200mil. However, in most cases, especially in multi chips design, the length will be greater than 2000mil (5cm), which will lead to more undershoot, overshoot, ringing and other phenomena affecting the signal integrity, which needs to refer to the mature design provided by the official.

TN662-1.2E 26(36)

### 4.4.6 Equal Length

DDR3 devices have strict requirements on the signal length. In fact, the equal length is to keep the signal delay consistent. The transmission delay of 1000mil (1inch) routing is about 165ps. For an 800 MHz clock frequency, the clock cycle is 625ps. It makes more meaningful to consider the signal offset as a percentage of the period: 625ps \* 1% = 6.25ps; 6.25 ps is about 40 mil. To match the length to within 1% of the clock cycle, it is required to match the routing to within 40mil, that is, the error is controlled within ±20mil.

The vias should be the same for equal length matching routing. The vias represents the additional length in the z-axis direction. The actual length of the vias depends on the starting and ending layers of the signal. Since all vias are different, it is not possible to specify the same delay value for all vias. The vias delay caused by stray inductance and capacitance exceeds the delay caused by the vias channel length. The maximum vias latency is 20ps. This number includes the delay based on z-axis and the delay caused by LC. Due to the complexity, it is recommended to use the same number of vias with the same parameters for matching routing.

Requirements of DDR3 devices for equal length signal are summarized as follows:

- 1. Data intra-group signal: The error is controlled at ±20mil, and DQS difference pair is set as the maximum in the group.
- 2. Address/control intra-group signal: The error is controlled within ±20mil.
- 3. Inter-group signals: The error is controlled within ±50mil.
- 4. The recommended routing length of the clock line is 250mil (42ps) longer than the average value of other bus lengths. This is because the differential signal has stronger anti-noise ability and better signal integrity than the single-ended signal, so the differential clock is faster than the single-ended signal transmission.
- 5. Differential intra-pair signal: The error is recommended to be controlled at 10mil.

### 4.4.7 Impedance Matching

It is recommended the impedance (Z0) 50  $\Omega$  for all single-ended routing. The error is ±10%. The differential signal impedance is 100  $\Omega$ , and the error is ±10%. Impedance values in this range are well matched with those of DDR3 devices and FPGA devices. The Z0 is usually specified by the designer, and eventually the PCB manufacturer adjusts the thickness

TN662-1.2E 27(36)

and trace width of the medium to meet the impedance requirements.

When the drive impedance matches the routing impedance, the best signal quality can be obtained. The DQ bus of the DDR3 device supports On-Die Termination (ODT), which enables the device to dynamically control whether the DQ bus connects to the termination resistance. Combined with the programmable drive of the DDR3 device, this increases the system flexibility and provides more accurate impedance matching in point-to-point system.

For the DQ bus termination resistance, this design uses DDR3 ODT function. When in write operation, DDR3 internal ODT is set to 60  $\Omega$ ; In reading, the internal ODT is closed, 34  $\Omega$  or 40  $\Omega$  drive strength can be chosen, as shown in Figure 4-8.

Controller  $R_{ON} = 34\Omega$   $R_{TTC}$   $R_{TTC}$ 

Figure 4-8 Data Bus Termination Scheme

Based on the simulation, ADDR/CMD/CNTRL signal lines are suggested to be connected to  $\Omega$  40-60  $\Omega$  VTT termination resistance in order to enhance signal driving ability. The ADDR/CMD/CNTRL signal line in this design is connected to 49.9  $\Omega$  termination resistance, which has been shown to have many advantages.

### 4.5 The Reference Plane

### 4.5.1 Continuous Reference Plane

Signal line routing must have a continuous reference plane to avoid hollowing out area across the reference plane, as shown in Figure 4-9. The routing with reference plane and vias edge should be maintained at least 30mil, as shown in Figure 4-10, except for signal fan-out area. All signal groups must have a complete VSSQ or VDDQ reference plane.

For read-write operations, the key signals are CK/ CK#, DQ, DM and DQS, which run twice as fast as other signal groups and require higher

TN662-1.2E 28(36)

signal integrity. DQ, DQS and clock line are preferably to choose VSSQ as the reference plane to reduce the noise to the minimum. If the VSSQ plane is not easy to reference, address and command signals can refer to the VDDQ plane.

Figure 4-9 Trace Spanning Split



TN662-1.2E 29(36)

>30mil

Figure 4-10 Margin between Trace and Hollowing out Area

In the fan-out area, the signal line should be in the middle of the two vias, and try to avoid the edge of the vias avoidance area of the reference layer, as shown in Figure 4-11.

Figure 4-11 Routing in the Pin Area



### 4.5.2 Reference Plane Suture

In some areas of the device, ground pins are rare, which can result in discontinuities in the reference plane. Coupled with the dense signals and small spacing, this can increase crosstalk and cause data errors.

As shown in Figure 4-12, the signal can be fanned out on both sides to

TN662-1.2E 30(36)

leave a non-vias area in the middle, so that a ground plane suture can be made. As shown in Figure 4-13, in the crowded area, the ground vias should be appropriately added, and the vias should be placed as far as possible in the semi-arc area formed by the routing.

Figure 4-12 Ground Plane Suture



Figure 4-13 Add Ground Via to Crowded Area (Green for Ground, Red for Power Supply)



A simulated eye view of the command/address/control line is shown in

TN662-1.2E 31(36)

Figure 4-14, which makes a contrast between the two situations: with or without a suture vias in the ground plane. The left figure simulates the use of ground suture vias, and the eye diagram height is180 mV. While the right figure simulates the non-use of ground suture vias, and the eye diagram height is 99 mV.



Figure 4-14 Command, Address, Control Line Simulation and Contrast

### 4.5.3 Signal Return Path

The easiest thing to overlook in PCB design is the current return path. It is important for termination resistance signals (parallel termination resistance) because the current through the termination resistance is very large. Most board-level emulators only consider the reference plane boundary and gap, but not consider the effect of the return path. Being aware of this problem, visual measurement can achieve good results.

The shorter the signal return path, the smaller the current noise and the stronger the anti-interference ability. The optimal signal return path should be located directly in the adjacent layer of the signal routing, and the layer spacing should not be greater than 5mil. Otherwise the area of the signal loop will increase. For the 8-layer stackup structure in this design (section 4.1.1), the 2nd and 7th are the complete ground plane, and it is a good choice to put the signal at the 3rd and 6th layers.

The signal changes between layer 1 and layer 3, causing minimal interference to the return path. The signal changes between layer 1 and layer 6, causing great interference to the return path. Even though in both cases the reference plane is the ground plane, the return path is different. When the signal is changed between the 1st and 6th layers, the return current needs to find a path between the different reference planes This will increase the area of the loop. If this is required, a ground vias can be placed near the changed layer to minimize the loop area.

Add as many ground vias as possible to the inside and edges of the device to provide a good return path for signals and power, especially at

TN662-1.2E 32(36)

4 PCB Design 4.6 Simulation

the corners of the device, where the number of griund pins is usually small.

If there is a reference plane changing from the ground plane to the power plane, the return path needs to be connected with a capacitance for plane-to-plane change, i.e., the jumper capacitance, as shown in Figure 4-15. This usually results in larger increase of loop area, so try to avoid it.

Figure 4-15 Jumper Capacitance

## 4.6 Simulation

#### 4.6.1 Simulation

Periodic simulation of I/O performance is recommended during the layout of a new design or modified design. The interface can be optimized by simulation to reduce noise and increase timing margin before prototype construction. When the problem is found in the simulation, the problem is often easier to solve. When the problem is found after PCB, it will face expensive and time-consuming circuit board redesign.

Manufacturers of memory devices have created many types of simulation models to match different tools. For Micron, the current component simulation models on the official website include IBIS, Verilog, VHDL, Hspice, Denali and Synopsys.

It is impractical to verify all possible conditions, but it needs to focus on some key points: the DC level, the signal transmission rate, undershoot, overshoot, ringing, and the waveform. In addition, it is important to verify whether the design has sufficient signal eye diagram openings to satisfy timing and resist power supply interference.

Snaking provides the required delay, but note that there is some kind of self-coupling that can alter the propagation delay of the signal. It is

TN662-1.2E 33(36)

4 PCB Design 4.6 Simulation

recommended to use simulation with coupling to verify timing.

Vias may cause timing errors. If every signal in the bus change layer through the same vias on the same layer, the influence of the vias can be ignored. If a mismatch occurs, the extra delay may bring the timing margin into negative. Vias should be considered in simulation, and if the entire bus is simulated, all vias should be considered. If the simulation does not include the entire bus, consideration should be given to compensating for additional vias delays. One formula for the extra delay is that the path length of the signal vias is twice the actual length of the vias.

### 4.6.2 Timing Budget Design

If there is an appropriate timing margin, it is recommended to design the hardware from the perspective of timing budget, so as to increase the flexibility of placement and routing. Starting from the simulation, the setup and hold time of the signal can be obtained by referring to the eye diagram of the memory device under ideal conditions. Then the parameters not included in the simulation are added to make the simulation result closer to the actual running environment.

The timing budget requires the system to work normally with all parameters considered. The higher the speed or the more complex the system, the more difficult to meet the timing budget. In some designs allowing deviations, it should first consider whether there is enough margin in the timing budget. The following table shows the parameters commonly included in the address bus timing budget.

| Parameter                                                | Setup | Hold |
|----------------------------------------------------------|-------|------|
| The value obtained from the ideal simulation             | 476   | 651  |
| Setup and Hold requirements provided by DDR3 manual      | 45    | 120  |
| The reduction calculated from the signal conversion rate | 2.3   | 2.8  |
| Offsets related to the VREFCA                            | 13    | 11   |
| DDR3 Derating                                            | 88    | 50   |
| Crosstalk                                                | 47    | 42   |
| Controller error                                         | 200   | 200  |
| Clock error                                              | 30    | 30   |
| Routing error                                            | 10    | 10   |
| Margin                                                   | 41    | 185  |

The margin is the ideal value minus all the other parameters. If it's a positive number, there's a margin. If there is a big difference between setup and hold margins, you need to offset the clock to get a more even result. The margin result in the above figure is acceptable.

TN662-1.2E 34(36)

4 PCB Design 4.7 Conclusion

# 4.7 Conclusion

Signal integrity, power supply, routing, and decoupling are major concerns when you are designing.

For different applications, before layout, the design analysis and simulation verification should be made to achieve better function and stability.

TN662-1.2E 35(36)

# 5 Notes

- 1. For using other manufacturers or other types of memory devices, first read the official design guide carefully.
- 2. Please refer to the official documentation for multiple memory devices design.

TN662-1.2E 36(36)

